Multi-Label Approaches to Web Genre Identification
نویسندگان
چکیده
A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label machine learning problem into several sub-problems of learning binary single-label classifiers, one for each genre. In this paper we explore multi-class transformation, where each combination of genres is labeled with a single distinct label. This approach is then compared to the binary approach to determine which one better captures the multi-label aspect of web genres. Experimental results show that both of the approaches failed to properly address multi-genre web pages. Obtained differences were a result of the variations in the recognition of one-genre web pages.
منابع مشابه
A Combination based on OWA Operators for Multi-label Genre Classification of web pages Una combinación basada en operadores OWA para la Clasificación de Género Multi-etiqueta de páginas web
This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in which a web page can be affected by mor...
متن کاملA Combination based on OWA Operators for Multi-label Genre Classification of web pages
This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in which a web page can be affected by mor...
متن کاملSingle and Multi Column Neural Networks for Content-based Music Genre Recognition
This working note reports approaches of team KART to MediaEval2017 AcousticBrainz Genre Task and their results. To solve the problem, we mainly considered the sparsity and noise of data, network design for the multi-label classification, and implementation of successful Deep Neural Network (DNN) models. We propose three steps of preprocessing and depict two different approaches: a single-column...
متن کاملTowards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems
We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished referen...
متن کاملWeb-Mediated Genres – A Challenge to Traditional Genre Theory
This paper explores the possibility of extending the functional genre model to account for non-linear, multi-modal, web-mediated documents. It adds a two-dimensional perspective to the genre analysis model in order to account for the fact that web documents not only act as text but also as medium. A substantial part of the paper is devoted to a discussion of the function of links; mainly becaus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JLCL
دوره 24 شماره
صفحات -
تاریخ انتشار 2009